Java - Unknown characters passing as [a-zA-z0-9]*?

Posted by Twodordan on Stack Overflow See other posts from Stack Overflow or by Twodordan
Published on 2011-01-13T14:44:17Z Indexed on 2011/01/13 15:53 UTC
Read the original article Hit count: 233

Filed under:
|
|
|

Hello, I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.

When I run this,

Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
       System.out.println(someData); //someData contains gottenData

certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle): screenshot

In case you're wondering, it DOES also display Text, it's not all like that.

For now, I don't mind the [?] as long as it also contains some string along with it.

Please help.

[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)

© Stack Overflow or respective owner

Related posts about java

Related posts about regex